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Abstract 



In this study we consider rateless coding over discrete memoryless channels (DMC) 
with feedback. Unlike traditional fixed-rate codes, in rateless codes each code- 
word in infinitely long, and the decoding time depends on the confidence level of 
the decoder. Using rateless codes along with sequential decoding, and allowing 
a fixed probability of error at the decoder, we obtain results for several commu- 
nication scenarios. The results shown here are non-asymptotic, in the sense that 
the size of the message set is finite. 

First we consider the transmission of equiprobable messages using rateless 
codes over a DMC, where the decoder knows the channel law. We obtain an 
achievable rate for a fixed error probability and a finite message set. We show 
that as the message set size grows, the achievable rate approaches the optimum 
rate for this setting. We then consider the universal case, in which the channel 
law is unknown to the decoder. We introduce a novel decoder that uses a mix- 
ture probability assignment instead of the unknown channel law, and obtain an 
achievable rate for this case. 

Finally, we extend the scope for more advanced settings. We use different 
flavors of the rateless coding scheme for joint source-channel coding, coding with 
side-information and a combination of the two with universal coding, which yields 
a communication scheme that does not require any information on the source, 
the channel, or the amount the side information at the receiver. 
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Chapter 1 
Introduction 

1.1 Background 

In traditional channel coding schemes the code rate, which is the ratio between 
the lengths of the encoder's input and output blocks, is an integral part of the 
code definition. If one of M messages is to be encoded at rate R, then the 
corresponding codeword has length n = (log M)/R. Provided that the rate is 
chosen properly, the error probability decreases as M grows. The capacity of the 
channel C is defined as the largest value of R for which the error probability can 
vanish. 

An alternative approach to fixed-rate channel coding is rateless codes. In this 
approach, we abandon the basic assumption of a fixed coding rate, and allow the 
codeword length, and hence also the rate, to depend on the channel conditions. 
When the encoder wants to send a certain message, it starts transmitting symbols 
from an infinite-length codeword. The decoder receives the symbols that passed 
through the channel and when it is confident enough about the message, it makes 
a decision. Perhaps the simplest example of a rateless code is the following (see 
e.g. [1, Ch.3] or [2, Ch.7]). Suppose that we have a binary erasure channel (BEC) 
with erasure probability 5. Suppose also that noiseless feedback exists, i.e. the 
encoder at time instant n has an access to the outputs of the channel at times 
1, . . . , n — 1. We use a simple repetition coding, in which each binary symbol is 
retransmitted until the decoder receives an unerased symbol. Since the erasure 
probability is 5, the expected number of transmissions until an unerased symbol is 
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received is 1 /(1—S). This transmission time implies a rate of 1—6, which is exactly 
the capacity of the binary erasure channel. This simple setting exemplifies some 
important concepts of rateless codes. First, the transmission time is not fixed, but 
rather is a random variable (geometrically-distributed in the above case); second, 
when the length of the transmission is set dynamically, the error probability 
may be controllable. In this case the transmission is only terminated once the 
decoder knows what message has been transmitted, so the error probability of 
this coding scheme is zero; third, the code design is rate- independent. In fact, 
this code can be used for any binary erasure channel; fourth, the continuity of 
the transmission requires feedback to the encoder. Indeed, as we shall see in 
this thesis, when rateless codes are used for point-to-point communication, some 
form of feedback, which can be limited to decision feedback, must exist to enable 
continuity. However, rateless codes are also invaluable for other settings such as 
multicast or broadcast communications, in which the existence of feedback is not 
explicitly required. Shulman [ ] introduced the concept of Static Broadcasting, in 
which the transmitter sends a message to multiple users, and each user remains 
connected until it retrieved enough symbols to make a confident decision. This 
scheme does not require feedback; the user remains online only as much as it 
needs, and the rate is determined according to the time the user spent online. 

In this thesis we assume a discrete memoryless channel (DMC) with feedback, 
and devise rateless coding schemes which allow a small (but fixed) error probabil- 
ity e. We investigate the dependence between the rate, the error probability and 
the size of the message set. The entire analysis is done for a finite message set, 
and we show that when the size of the message set is taken to infinity, our results 
agree with classic results from coding theory. We also investigate the rate of con- 
vergence to these results. We start by building a simple rateless coding scheme 
for a known channel. The motivation for this method is due to Wald's analysis 
(see [3, Ch.3]), where he demonstrated that the Sequential Probability Ratio Test 
(SPRT) performs like the most powerful test in terms of error probabilities, while 
using about half the samples on average. 

Building on the rateless coding scheme devised for the case of known channel, 
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we obtain a universal channel coding scheme that does not require channel knowl- 
edge at the receiver. Unlike previous results on universal decoding, the results 
here are non-asymptotic and are valid for an arbitrary message set size. We then 
extend the coding scheme to joint source-channel coding, and show that optimal 
rate is achievable even when the encoder is uninformed on the source statistics. 
Next, we use a rateless coding scheme for source coding with side information at 
the receiver and show that the Slepian-Wolf rate for this scenario is achievable 
even when the encoder is unaware of the amount of side information. Finally, 
we show how to combine the above-mentioned techniques with universal source 
coding, to obtain a scheme that can operate when the statistics of both the source 
and the channel are unknown, potentially using side information that is obscure 
to the encoder. 



Our work follows previous results discovered by Shulman [1] for the universal 
case, where the decoder is ignorant of the channel law. In particular, a sequential 
version of the maximal mutual information (MMI) decoder [ I] is used for univer- 
sal channel decoding and joint-source channel coding, including the case of side 
information at the decoder. However, the results in [1] are asymptotic in the size 
of the message set, while the analysis here is made for a fixed size of the message 
set. For the case of known channel, the decoder used here can be viewed as the 
counterpart of the sequential MMI decoder that uses the channel law rather than 
the empirical mutual information. This scheme has been originally introduced by 
Polyanskiy in [ ], where it is proven to achieve the best variable- length coding 
rate. While the analysis in [5] concentrates on finding the best achievable size 
of the message set with a constraint on the average decoding time, in this paper 
we seek the optimum decoding time for a fixed size of the message set. More 
importantly, the analysis introduced here is then extended naturally to apply for 
the case of unknown channel, where we use a novel universal decoder, as well as 
for joint source- channel coding with and without side information at the receiver. 
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1.2 Thesis Outline 



The rest of the thesis is organized as follows. In Chapter 2 we define rateless 
codes and provide related definitions and notation. In Chapter 3 we survey pre- 
vious results related to universal communication and rateless codes. In Chapter 
4 we treat the case of known channel, for which we obtain an achievable rate 
using rateless codes. We also prove a converse theorem showing that this rate 
is asymptotically optimal, and we analyze the rate of convergence. The case of 
unknown channel is examined in Chapter 5, where we develop a universal decoder 
and analyze its performance for a general DMC. In Chapter 6 we extend the cod- 
ing scheme for the case of message sets with non-equiprobable messages, and we 
also show how rateless coding can be used for problems with side information. 
Chapter 7 concludes the thesis. 
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Chapter 2 

Definitions and Notation 



Throughout this thesis, random variables will be denoted by capital letters and 
their realizations by the corresponding lowercase letters. Vectors are denoted 
by superscript that indicate their length, for instance X n = [X 1: . . . ,X n ]. Un- 
less otherwise stated, all logarithms are taken to the base of 2. We focus on 
communication over a discrete memoryless channel (DMC) characterized by a 
transition probability p(y\x), x G X ,y G y, where X and y are the input and 
output alphabets of the channel, respectively. With a slight abuse of notation, 
we use p(-\-) also to denote the joint transition probabilities of the channel, thus 
p(y n \x n ) = Y\i=iP(yi\ x i)- The capacity of the channel (in bits per channel use) in 
conventionally defined as C = max^) I(X; Y), where I(X; Y) is the mutual in- 
formation between the input of the channel and its output, and the maximization 
is over all channel input priors q(x). If \X\ = \y\, and p(y\x) = 1 if x = y and 
p(y\x) = otherwise, then the channel is said to be noiseless, and in that case 
C — \og\X\. We also assume that a noiseless feedback exists from the receiver to 
the transmitter. 

A rateless code has the following elements: 

1. Message set W containing M messages. Without the loss of generality we 
assume that W = {1, . . . , M}, with corresponding probabilities 7r(l), . . . , ir(M). 
Occasionally, we define K = log M as the number of bits conveyed in a mes- 
sage. 

2. Codebook C = {cj}^, where each codeword Cj G X°° is generated by 
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drawing i.i.d. symbols according to a prior q(x),x G X. 



3. Set of encoding functions f n : W — > X, n > 1. 

4. Set of decoding function 9„:f4WU{0},fi>l. 

Unlike conventional codes, for which the rate is a fundamental property, the 
above description does not specify a working rate-hence the term rateless code. 
To encode a message w G W, the encoder starts transmitting the codeword c w 
over the channel. Upon receiving each channel output, the decoder can either 
decide on one of the messages w or decide to wait for further channel outputs, 
returning '0'. Through feedback, the decoder's decision is known to the encoder, 
which correspondingly decides whether to transmit further symbols from c w or 
to proceed to the next message. We note that two different forms of feedback can 
be assumed here: channel feedback and decision feedback. In channel feedback, 
the encoder at time instance t observes the channel outputs so far, and 

by imitating the decoder's operation it becomes aware of any decision made by 
the decoder. In decision feedback, the encoder is only informed that a decision 
has been made, and it can proceed to the next message. While channel feedback 
requires no intervention from the decoder in the feedback process, it essentially 
assumes that the feedback channel has the same bandwidth as the main channel. 
Decision feedback, in contrast, requires only one feedback bit per symbol. 

We conclude this section with a few definitions required for the next sections. 

Definition 1. A stopping time T of a rateless code is a random variable defined 

as 

T = min{n : g n {Y n ) ± 0} (2.1) 
Definition 2. An effective rate R of a rateless code is defined as 

(2-2) 

E{T} v ; 

where E{T} = E 9 {E P {T}}, i.e. the averaging is done over all possible codebooks 
and channel realizations. 
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Using the definition of stopping time, we can define the error event as the 
case in which the decoder stops, deciding on the wrong message. The error event 
conditioned on a particular message is defined as 

E w = {W^w\W = w} (2.3) 

where W = g T (Y T ). The average error probability for the entire message set is 
therefore 

M 
w=l 

Definition 3. For a given DMC, an (R, M, e)-code is a rateless code with effective 
rate R, containing M messages and error probability P e < e. 
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Chapter 3 
Previous Results 



As noted, the rateless coding scheme is a special case of communication over a 
channel with feedback. Shannon [6] proved that the capacity of a DMC is not 
increased by adding feedback. However, adding feedback can increase the zero- 
error capacity of the channel. In his well known paper, Burnashev [7] investigated 
the effect of feedback in communication over a DMC by analyzing the error 
exponent of such channel. Introducing the notion of random transmission time, 
Burnashev obtained a bound on the mean transmission time for a fixed error 
probability, from which he derived the error exponent 1 for a DMC with feedback. 
He also proved a converse theorem showing that the expected transmission time, 
hence also the error exponent, are asymptotically optimal. (That is, they coincide 
with the results of the converse theorem as the size of the message set grows to 
infinity.) The main result of [7] is the following theorem. 

Theorem (Burnashev [7]). The optimum error exponent for a DMC with noise- 
less feedback is 

^-W) ,ogP ' =Cl ( 1 -^l- °- R - c (31) 

where T is the transmission time, R is defined in (2.2) and 

max D (p(.\x)\\p(-\x')) (3.2) 

Examining (3.1) we can observe that whenever R > C, the error exponent 
1 Referred to as reliability function. 



9 



vanishes, which concurs with Shannon's result [6]. Moreover, whenever the chan- 
nel has at least two inputs that are completely distinguishable from one another, 
i.e. p(y\x) > and p(y\x') = for some x,x' G X and y 6 y, it holds that 
D (p(-\x)\\p(-\x r )) — > oo and hence also C\ — > oo for that channel. Therefore, the 
error exponent in that case is infinite at every rate below the channel capacity, 
which implies that the zero-error capacity coincides with the channel capacity C. 

Also for the case feedback channels, Shulman [ ] developed a coding scheme 
providing reliable communication over an unknown channel, without compromis- 
ing the rate. Introducing the concept of static broadcasting, which is based on 
random codebook and universal sequential decoder, he demonstrated that it is 
possible to achieve vanishing error probability at rate that tends to the capacity 
of the channel as the size of the message set grows indefinitely. Furthermore, 
Shulman showed that even if the statistics of the information source is unknown 
to the transmitter, this scheme achieves the optimal decoding length that would 
have been achieved if the source were compressed by an optimal source-encoder 
and the channel were known at both ends. More formally, if K information bits of 
a source S were to be transmitted over an unknown channel W, then the average 
decoding length satisfies 

E{T} H(S) , . 

^-V = WW) (3 ' 3) 

where P is the codebook generation prior and I(P; W) is the mutual information 
between the input and the output of the channel W when the input is drawn 
according to distribution P. 

Shulman also used the coding scheme for source-encoding of correlated sources. 
He demonstrated that using static broadcasting, it is possible to achieve the 
Slepian-Wolf optimal rate region. Combining all into one communication scheme, 
the achievable decoding length is 

E{T} H(S\Z) , ^ 

where Z is the side information at the decoder. Shulman's work has been the 
main inspiration for this research. 
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For the case of unknown channel, Tchamkerten and Telatar in [8] used a rate- 
less coding scheme similar to the one defined in Chapter 2, where the stopping 
condition is that the mutual information between (at least) one of the code- 
words and the channel output sequence exceeds a certain time-dependent thresh- 
old. The authors proved that this scheme can achieve the capacity of a gen- 
eral DMC 2 Moreover, they demonstrated that for the class of binary symmetric 
channel with crossover probabilities L e [0, 1/2), this coding scheme can achieve 
Burnashev's exponent at a rate bounded by any fraction of the channel capacity 
The latter result is obtained by using a second coding phase, in which the trans- 
mitter indicates whether the decoder's decision is correct (an Ack/Nack phase). 
Tchamkerten and Telatar also demonstrated that for the class of Z channels with 
parameter L e [0, 1), the achievable rate can be arbitrarily close to the channel 
capacity, while the error exponent is infinite. The latter result also coincides 
with Burnashev's exponent (C\ in (3.1) is infinite in this case), since error-free 
communication is attainable for the Z channel. 

We note that all the above-mentioned results were asymptotic in the size of 
the message set. Recently, Polyanskiy, Poor and Verdii in [5] introduced non- 
asymptotic results for communication over DMC with feedback. Through the 
use of variable-rate coding and sequential decoding they obtained upper and 
lower bounds for the maximal message set size for fixed bounds on the error 
probability and mean decoding length. The authors showed that for an error 
probability constraint P e < e and mean decoding length constraint E{T} < £, 
the maximal message set size M*(£,e) satisfies 

on on 
— - -log .£ + 0(1) < logM*(£,e) < — + 0(1) (3.5) 

The setting of [ ], as well as the coding scheme, is similar to the one defined later 
in Chapter 4. However, while in [5] the optimization is on M, for fixed e and 
£, we fix e and M and find the optimum mean decoding length. The analysis is 
slightly different, but the results of Chapter 4 comply with [5]. The analysis in 

2 Since no assumption has been made on the capacity-achieving prior, authors only demon- 
strated that the rate approaches I(PQ), where P is the codebook generation prior and Q is the 
transition probability of the channel. 
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Chapter 4, coming next, lays the ground for the derivation of our novel results 
for the case of unknown channel. 
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Chapter 4 



Rateless Coding — Known 
Channel 



4.1 Sequential Decoder 

We begin by introducing a rateless coding scheme for noisy channels and analyzing 
its effective rate, under certain constraints on the size of the message set and the 
error probability. As will be shown in the sequel, the effective rate is closely 
related to the channel capacity C. More precisely, we will show that under the 
conventional setting, in which the size message set is taken to infinity, the effective 
rate coincides with the capacity of the channel. 

Consider a discrete memoryless source with a set of M equiprobable messages, 
i.e. 7r(i) = 1/M, i = 1, . . . , M. We use a rateless code as defined in Section 2, 
where each codeword Cj, i = 1,...,M is generated by drawing i.i.d. symbols 
according to q(x), the capacity-achieving prior of the channel. The source of 
randomness generating the codewords is shared by the encoder and the decoder, 
so that the codebook in known at both ends. The decoder uses the following 
decision rule: 



where {c^^j^i are the symbols in c w . If the threshold crossing condition in 
(4.1) is satisfied by more than one codeword, we randomly choose one of them 
and declare an error. We note here that similar decoders have been proposed 




0, if no such w exists 



(4.1) 
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by Polyanskiy [5] and Burnashev [7, Ch.3]. The decision rule at (4.1) can be 
equivalently written as 

9n(y n ) = < . ' ' (4.2) 

I 0, it no such w exists 



where 



z wk = log — — , k = l,...,n (4.3) 

g(c„,,fc) 



and we define a = log A. 

The above-described coding scheme can be summarized as follows. Having 
selected a message, the encoder starts transmitting an infinite-length random 
codeword corresponding to that message. The decoder sequentially receives sym- 
bols from this codeword that passed through the channel, and at each time instant 
k calculates z w ^ for w = 1, . . . , M. It then updates a set of M accumulators, each 
corresponding to a possible message, and checks whether any of those crossed a 
prescribed threshold a. If neither of the counters crossed the threshold, '0' is re- 
turned and the decoder waits for the next channel output; if exactly one counter 
crossed the threshold, the decoder makes a decision; and if more that one thresh- 
old crossing occurred, an error is declared. In the two latter cases, the encoder 
proceeds to the next codeword. 

For the above- described scheme we have the following theorem. 

Theorem 1. For the decoder in (4.2) with P e < e, the following effective rate is 
achievable: 

R = (4.4) 

1 "r" logAf 

Proof. Since T is a stopping time of the i.i.d. sequence Z±, Z 2 , . . ., Wald's equation 
[3] implies 

E{Z 1 + • • • + Z T \ , , 

= E{Z} - ■ < 45 > 

where E{Z} is the expectation of a single sample Z^. If Xi and Yj, are input and 

output of the channel, respectively, then by the definition of Zi we have 

E{Z} = E{Z t } = E {^y^} = C. (4.6) 
14 



Furthermore, since the stopping condition was not fulfilled time instant T — 1 we 
have 

Zx + ... + Z T _i < a (4.7) 

which implies 

Z x + . . . + Z T < a + Z T (4.8) 
Combining (4.5), (4.6) and (4.8) we obtain 

E{T} < i±£. (4.9) 

We now tune the threshold parameter a to meet the error probability require- 
ment. Suppose that the stopping time of the correct codeword is T w . An error 
occurs if a competing codeword c w >, independent of {Yk}f? =1 , crosses the threshold 
before c,„ does. Thus, 



, w'^iw t=l 



(4.10) 



< 



< 



(M - i)pr {y{iifi^}} < 4 - ii » 
(M -Wy{fe§§M} (4 - i2) 



where (4.11) follows from the union bound for an arbitrary series {Xk}^ =1 drawn 
i.i.d. from q(x), independently of {YjS\^ =1 . Note that the bound in (4.12) repre- 
sents the probability that a randomly-chosen codeword will exceed the threshold 
at any time instant. Define a sequence of random variables 

f p{X t \Y t ) i-rt-l rr < A 

U t ={ « ' lk tS (4.13) 
I 1, otherwise 

If at instant t the threshold at (4.12) is exceeded for the first time, then we have 

Uk = p(Xk\Yk)/q(Xk) for k = 1, . . . ,t and £/& = 1 for all k > t. Therefore, it is 
easy to see that 

M ( n V P(Xfc|Ffc) > a) # TT U t > A (4.14) 
15 



We can also see that K{U t } = 1 for all t because 



t-i 



E\u t \l[U k >A\ = 1 



k=l 

since Ut = 1 deterministically in this case, and 

t-i 



(4.15) 



E^E^?^|y t ^ (4.16) 

>(s|Y 
g(x) 



(4.18) 



where (4.17) follows since X t and Y t are independent. For an arbitrary N we have 

N \ ( ( N N-l 



E{C^}.E|jJ^ t | (4.20) 
^\i[U t \ = ...=E{U l } = l (4.21) 



t=i 

Since the above holds for all N, we also have 



Returning to (4.12), we get 

P r{ ^ £ (M- 1 )P r {y{I|«>4} (4,3) 
= (M - 1) Pr I f[ U t > A I (4.24) 



t=i 



M - I , 
< — — (4-25) 
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where (4.25) follows from (4.22) and Markov's Inequality. Since the above holds 
for all w G W, we also have 

M - 1 , 

Pe < — — (4.26) 

By choosing a = logM — loge, or equivalently A = M/e, we secure that 
P e < e. Substituting a into (4.9) and using Definition 2, we obtain (4.4). □ 

It is important to note that the encoding operation is independent of the 
working rate; the encoder needs to know the channel law only to generate the 
codebook. However, if the channel is known to belong to a family for which the 
capacity-achieving prior is known (e.g. the uniform prior for symmetric channel), 
then the optimal rate can be achieved even when the encoder is uninformed on 
the channel law. Furthermore, from a practical point of view, using the uniform 
prior instead of the capacity-achieving prior is known to perform relatively well 
in many cases. For instance, using a uniform prior in a binary channel will lose 
at most 6% of the capacity (see [J, Ch.5]). 

4.2 Coding Theorem for Known Channel 

We will now use the coding scheme developed in Section 4.1 to prove the main 
result for rateless channel coding. For a fixed error probability, we will obtain an 
achievable rate using rateless codes. Then, we will prove that this rate is within 
O (log log Mj log M) from the optimal rate achievable with this error probabil- 
ity. Before we get to the main theorem, we prove the following lemma, which 
facilitates some refinement in the achievable rate. 

Lemma 1. Suppose that an (R, M, e)-code exists for a DMC. Then for any < 
a < 1, there also exists an (B! , M, e') -code for the same channel, where 

R' = (l-a^R (4.27) 
e' = a + e-ae (4.28) 

Proof. To show that the triplet (R f , M, e') is achievable, we use the original code 
with randomized decision-making at the decoder. For each transmitted message, 
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the decoder either terminates the transmission immediately and declares an error, 
with probability a, or uses the original decision rule. Denote the stopping time 
of the original decoder and the modified one by T and T', respectively. The 
expected decision time of the modified decoder is 

E{T'} = (1 - a)E{T}, (4.29) 

which implies 

R' = (l- a )- l R. (4.30) 

The error event in the modified scheme is a union of two non-mutually-exclusive 
events: error in the original decoder and the event of early termination. The 
probability of this union is 

e = a + e — ae. (4-31) 

Finally, we note that the number of messages in the codebook remains unchanged — 
which completes the proof of the lemma. □ 

Theorem 2. For rateless codes, the following rate is achievable: 

( i-V log m c_ e> i/] os M 

R' = < > ogM (4.32) 
\jtSmI e<l/logM 

V ' logM 

We note that if e is fixed and M is large enough so that e > 1/logM, the 
achievable rate has the following asymptotics: 

Proof. Theorem 1 implies that the triplet (R, M, 5) is achievable for all < 5 < 1, 
where 

R = i , c-w ( 4 - 34 ) 

1 ' log M 

By Lemma 1, we can also achieve (R', M, 5'), where 

R' = , C - , Tx (4-35) 



logM 

5' = a + 5-ad (4.36) 
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for all < a < 1. By choosing 



we obtain 



a = rh (4 - 37) 



R' = 1 r f , (4.38) 

1 ^ logM 

5' = e (4.39) 

Since the foregoing analysis holds for all < 5 < e, we can choose S = min{e, 1 / log M} 
to obtain (4.32). □ 

Remark 4.1. If e < 1/logM, the choice 5 = e implies a = 0, that is, no random- 
ization at the decoder. This result could be anticipated, since the randomized 
decoder trades rate for reliability: it obtains a better effective rate with some 
compromise on the error probability. Hence, whenever the error probability con- 
straint is more important than the working rate - randomization can only worsen 
matters. 

4.3 Error Exponent 

Theorem 2 in the previous section provides a relation between the working rate 
and the allowed error probability. We will now investigate this dependency in the 
regime of low error probability by developing the error exponent induced by this 
coding scheme. Assuming that a low error probability is required, randomization 
at the decoder is inapplicable, so (4.32) can be rewritten as 

' loge = C-i?--^- (4.40) 



log M log M 

Recall that R = logM/E{T}, so 



logf C-R-^-±E(R) (4.41) 



E{T} log M 

We can see that the error exponent is a linear function of the rate, which is 
also the case in Burnashev's analysis (3.1) (albeit with a different coefficient). 
Furthermore, as M grows, the error exponent converges to C — R and the con- 
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vergence is dominated by a term of order 0(1/ log M), or 0(1/K). This term 
can be interpreted as a penalty for using a finite message set. 

4.4 Weak Converse 

In the previous section we have seen that if we use a codebook with M messages 
and allow an error probability P e < e, then we can achieve an effective rate with 
the following asymptotics: 

We will now prove that under the above constraints on the message set and 
the error probability, the best achievable rate has the same asymptotics. In 
other words, the achievable rate at (4.32) converges to the optimal rate, and the 
convergence is dominated by a term of order 0(1/ logM). 

Theorem 3. Given a decoder with random 1 stopping time T , any rate for which 
the probability of error does not exceed e satisfies 

^tM i+o G^))- (443) 

Proof. Define 

fi(n) = H{W\Y n ) + nC. (4.44) 

By [7, Lemma 2]) we have 

E{fi(n + l)\Y n } - fi(n) = E{H{W \Y n+1 ) - H(W\Y n )\Y n } + C > (4.45) 

which implies that /i(n) is a submartingale with respect to the process {Ffc}^. 
Therefore we have 

logM = H{W) = /i(0) 
< E{/i(T)} 

= E{H(W\Y T )} + C-E{T} (4.46) 

1 Fixed stopping time is a private case of random stopping time, in which T takes only one 
value. 
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Furthermore, by [7, Lemma 1] we have 

E{H(W\Y T )} < h (P e ) + P e ■ log(M - 1) 

<l + e-logM (4.47) 

where (4.47) follows from the requirement P e < e, and from an upper bound on 
the binary entropy function. Combining (4.46) and (4.47) we obtain 

logM < l + e-logM + C-E{T} (4.48) 

which implies 

logM _C_ ( 1 \ 

E{T} 1 — e V C-E{T}J K ' 

Furthermore, from (4.48) we can see that 

C ■ E{T} > (1 - e) • log M - 1 (4.50) 



and therefore (4.49) can be replaced by 

logM C 

E{T} ~ 1 - e V.^ ' " V lo g M 



□ 



Remark 4.2. While (4.32) approaches (4.51) for large M, the upper bound is not 
tight for a finite M. Note that the converse used here is "weak" , in that it is based 
on Fano's inequality, which is known to be loose in many cases. We conjecture 
that a strong converse can be found, which will be tighter (i.e. closer to (4.32)) 
even in the non-asymptotic realm. 

Remark 4.3. Equation (4.33), the achievable rate, is essentially equivalent to the 
left-hand side of [5, Eq.18], and equation (4.51), the upper bound on the rate, 
is equivalent to the right-hand side of that equation. Note, however, that the 
formulation is slightly different: in [5] size of the message set M is optimized 
with constraint on the maximal transmission time, while here M is fixed and the 
transmission time is minimized. 
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4.5 Further Discussions 



4.5.1 Application for Gaussian Channels 

While the analysis in Sections 4.1 and 4.2 is done for discrete channels, it can be 
easily extended to memoryless Gaussian channels. Suppose that X t and Y t are 
the input and output of an additive white Gaussian noise channel at time instant 
t, i.e. 

Y t = X t + V t , t = 1,2,... (4.52) 

where {Vj}^ is a sequence of i.i.d. Gaussian RV's with zero mean and a known 
variance. The encoding and the decoding processes, as well as the expression 
for the resulting effective rate, are similar to those of the DMC, where q(-) is 
the codebook generation PDF and p(-\-) is the transition PDF of the backward 
channel. 

Specifically, consider the above- described setting where Vk ~ N(0, 6). Suppose 
that the codebook is Gaussian with power constraint P, i.e. C m ^ ~ N(0, P) for 
all m, k. (Here again, C m ^k is the fc-th symbol of the m-th codeword.) The 
decoding rule is given by (4.1), where 

g(x) = (27rP)- 1/2 exp|-^| (4.54) 
The effective rate of the decoder is given in (4.4), where 

<7=i]ogfl + (4.55) 

4.5.2 Limited Feedback Channel 

In the forgoing analysis, we assumed that the feedback channel must be used 
once per each main channel use. In practice, however, it may be desirable to 
reduce the amount of data transmitted over the feedback channel. For instance, 
in the case of broadcasting to multiple users, the upstream channel may have a 
more stringent bandwidth constraint as it must be accessed by all users. It is 



22 



therefore interesting to see how lowering the frequency of the feedback affects 
the performance of the rateless coding scheme. Suppose that we want to use 
the feedback channel only once per s received symbols. The maximal number of 
excess symbols transmitted over the main channel (i.e. the number of symbols 
transmitted after a decoder without feedback limitation would acknowledge the 
message) is s — 1, which implies an effective rate of 



From (4.56) we see that limiting the feedback frequency has negligible effect if 
either s <C (— loge)/C or s < (log M)/C. In the former case, the required 
confidence level is high, and in the latter case the messages are long. That 
is, in both cases the codewords are long with respect to the capacity of the 
channel, which implies long transmission time. Therefore, in both cases the 
excess decoding time is small compared to the entire transmission length, and 
the effect of the limiting the feedback is negligible. 



R 



C 



(4.56) 



1 + 



(s-l)C-loge 
logM 
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Chapter 5 

Rateless Coding — Unknown 
Channel 

In Chapter 4 we assumed that the communication channel, characterized by 
p(y\x), is known at the receiver end. Assume now, that the underlying chan- 
nel is unknown to the receiver. The capacity of the channel is known to be 
achievable in this scenario using sequential versions of the Maximal Mutual In- 
formation (MMI) decoder [1], [8]. However, while these schemes provide reliable 
communication at rate equal to the channel capacity, they assume that the size 
of the message set M is infinite. In this chapter we try to answer the question 
whether universal communication is feasible with a finite message set, and if it 
is, what rates are achievable? As we shall see shortly, it is possible to achieve 
reliable communication over an unknown channel even when the message set is 
finite, and we can also bound the rate degradation due to lack of information 
about the channel law. 

5.1 Achievable Rate for an Unknown Channel 

Suppose that we wish to communicate over a DMC with unknown (backward) 
transition probabilities 

9 lJ = Pr{X = i\Y=j}, i = l,...,\X\ j = l,...,\y\ (5.1) 

We use a coding scheme similar to the one described in Chapter 4 with the 
following modification. Instead of using the true transition probability pe(x'|?/'), 
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which is unknown to the decoder, we use a universal probability assignment 
defined as 

MAv*) = I wiO'^eixWdd' (5.2) 

J A 

where 

A=<U'G[0,1]™| fx = l, j = l,-,\y\\ (5-3) 



i=l 



and the weight function w (■) is chosen to be Jeffreys Prior l , i.e. 

w(0>) = i (5.4) 



where 



B = / (5.5) 



A 



Remark 5.1. While the unknown channel is usually characterized by a set of 
transition probabilities 

% = Pr{F = j|X = z}, z = 1, . . . , \X\ J = 1, . . . , \y\ 

the entire derivation here is done for the backward channel parameterization given 
in (5.1). However, this does not need to bother us since the entire analysis 
assumes a known input prior q(x), and therefore given the parameters in 

(5.1) are well-defined. Moreover, the region A, induced by {%} and q(x), is 
clearly contained in the region A. Therefore, if a coding scheme is universal with 
respect to all possible realizations of the backward channel, it is also universal 
w.r.t. all possible realizations of the forward channel. 

The universal probability assignment implies the following decoding rule, 
which is the universal counterpart of (4.1): 

/ ,n\ jw, pu{c w \y n )>A-q(c w ) 
9n{y ) = < (5.6) 
I (J, it no such w exists 

In Chapter 4 we used Wald's Identity to bound the expected transmission 



1 This is also a special case of Dirichlet Distribution. 
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time, thereby obtaining an effective rate for the sequential decoder. Unfortu- 
nately, in the universal case Pu{'\~) is n °t necessarily multiplicative, so \ogpu{-\-) 
cannot be expressed as the sum of i.i.d. random variables. Therefore, the ex- 
pected transmission time in the universal case cannot be calculated directly by 
applying Wald's identity. Nevertheless, as we shall see shortly, we can use the 
results for the known channel case to obtain an upper bound for the transmission 
time in the universal case. 

The following lemma shows that given two sequences x l and y*, the universal 
metric cannot be too far from the conditional probability assignment that is 
optimally fitted to x l and y l . 



Lemma 2. For any two series x and y we have 

*2g$ <- + m) loge (5 . 7) 

where 

= argmaxp0'(x*|?/) (5.8) 
e'eA 

and we define 

r(i/2)W 

^ =los wm (5 - 9) 



Proof. Note that 

!,„(,■>!/) .naxnC" ; -" : '- ,! (5-10) 

= maxmff^ • maxW^^^ • . . . • max T]^^' m 

(5.11) 

where 

N(x t ,y t ;i,j) = \{k : (x k , y k ) = (5.12) 



27 



Since both w(-) and pe are multiplicative functions, we also have 

PuW) = [ w(0')p 9 ,(a*\y t )dO' (5.13) 

J A 

= f w{0) J] ~ef (xt ^ l ' l) d~6 ■ [ w(0) J] 6? ( * V;i ' 2) d0 (5.14) 

J A i J A i 

■ [w{6)\[9^ xt ^ im de (5.15) 
Ja . 

where 

A=i§G[0,l]l*" I f> = l, 1 (5.16) 



i=i 



From [9, Lemma 1] we know that 



, ^mUXI^ < 1*1-1, f /|*p» |*r 



(5.17) 



for all j = 1, . . . , 1 3^ | - Thus, we obtain 



log M^V) - log II j^^pss (5 ' 18) 

(5.19) 

= M^M logt+/3 (5 ,o) 

where we define 

„ . , w „ + (M + ffil) log e - MM log w ( , 21) 

□ 



We are now ready to prove the main theorem for rateless coding over an 
unknown channel. 

Theorem 4. For the decoder in 5.6 with P e < e, the following effective rate is 
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achievable: 

C (i- iW^ 



p \ log M In 2 y 

C+/3-log ef^ 1 (log log A/-log C- j^) 
1 log M 



Proof. The stopping time in the above-described scheme is 



T = min < t: §^\>A) (5.23) 



since 

t\..t\ /_ti..tN i PoC^'la/*) 



log pc; (a* |y*) = logp (xV) - log ; / (5.24) 



we have 



Pt/(g f |y f ) 



T = min^: ^ A ™ \ > A\ (5.25) 



" |,:los w >llE " +bE s™J (5 ' 26) 

< mm ^ : V log ?*<f#> > log A + ™ , og j + A (5 . 28) 



fe=i 



where (5.27) follows since p^x&li/fc) > pe(xk\yk) by definition (5.8) and (5.28) 
follows from Lemma 2. 

From the same considerations as in the proof of Theorem 1, at the stopping 
time T we necessarily have 

£ log !^< a+ m logT+ , +log **™ (5 . 29) 

fri q[Xk) 2 q (x T ) 

where we define a = log A. By (5.29) and Wald's Identity, 

E{T} = —± — - 1 ' J (5.30) 



Q + m. E{ i ogT} + /3 + c 

< ^ (5-31) 
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Since log 2 M < + ^°S2 V ~ fi> ^ or a ^ u > v > ^' (5-31) implies 

E{T} < 2 \ 6 ln2 \ 5.32 

U 1 1 ~ C-uln2 



For f = we obtain 



a+ ffl (log log M- log C- + C / 

E{T} < ? — V 6 % 5.33 

L J ~ r (1 _ \x\\y\/2 \ v 1 

^ y 1 log M In 2 J 



which corresponds to the following effective rate: 



ClogM (l 



\x\ 



, log M In 2 / . 

R = — mw-, 5 - 34 

+ ™ (loglogM - logC - ^) + /3 + C 



Similarly to the derivation in Chapter 4, we bound the error probability by 
Pll E m} < { M-l)Pr^{?E^p > A) 



where {Xfc}^ =1 and {Yk} < j£ =1 are independent sequences. Define 

Pujx*\y*} rr*- 1 



$ f = \ Pv{Xt-i\Y*-^)-q(Xt)i llfc=l** - 71 

1, otherwise 



We can see that 



Furthermore, we can see that E{$ t } = f for all t since 



t-i 



fc=i 

30 



(5.35) 



and 

t-i 




E 



= E 



/ A «;(e , W(x*-i|y*-i)de / 



E 



/ A «;(©'W( a *-i|y*-i)de / 

/^(flOM*'" 1 !!/ 1 " 1 )^ 



For an arbitrary iV we have 

AT ^ ( ( N N-l 



E in $ *r =E i E in $ 'in $ 

( JV-l 



' =- J J I U 1 /—j 



= E{<I>.v}-E^ j"[ <I», 

=4n4=--- =i 



Since the above holds for all N, we also have 



E<m$ 



/ i 



Thus, similarly to the case of known channel, the error probability can be 1 

by 

P r {BJ<(M-l)P r {G{?*glp>^ 



= (M - 1) Pr <^ Y[ $ t > A i < 



M -1 



,t=i 
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Here again, we choose A = M/e to obtain P e < e. Substituting a = log A = 
logM - loge into (5.34) we finally get (5.22). □ 

Remark 5.2. Interestingly, the upper bound on the error probability in (4.23), 
obtained when the decoder uses the known channel law p(x\y), applies for an 
arbitrary probability assignment pu{x\y), where the only required constraint is 
that the latter integrates to unity. 

Remark 5.3. As in the case of known channel, we can use randomized decoder 
here to obtain the following rate: 

rf-i \x\\y\/2 \ , . 

^ \ IogMln2 I 1 — 6 , , . 

R = (5 49) 

C+/3-log^+^i(loglogM-logC- I ^) 1-e 
1 logAf 

for all < 6 < e. As we mentioned in Section 4.2, if the required error probability 
is small, randomization should not be applied. However, if the error probability 
constraint is loose enough, a better rate may be obtained by optimizing delta in 
(5.49). 

5.2 Discussion 

5.2.1 Comparison to the Known Channel Case 

Having obtained achievable rates for the cases of both known and unknown chan- 
nels, it is interesting to compare these results and evaluate the rate degradation 
due to the unknown channel. For the case of a known channel, the effective rate 
at (4.4) can be approximated by 

*~c(l-Z=m (5.50) 



logM 

For the case of unknown channel, we can approximate (5.22) by 

|*||y|loglogM (|*||y|/2-l)/ln2 + /? + logC \ 

logM logM J \\og 2 M 

(5.51) 
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Hence, the penalty for lack of channel knowledge amounts to 



R — Ru 



C 



^||y| log logM (|A-||y|/2-l)/ln2 + /3 + logC 
2 logM logM 



(5.52) 



+ 



\og 2 M 



The leading term in the latter expression behaves as 0(log logM/ logM) = 
0(\ogK/K), factorized by the product of the cardinalities of input and output 
of the channel. It is interesting to compare this result with known results from 
universal source coding, where the redundancy 2 is dominated by the cardinality 
of the alphabet of the source [10], and a term that behaves as 0(}ogn/n), where 
n is the source length. 

5.2.2 Induced Error Exponent 

Let us now examine Theorem 4 in light of the previous results. Equation (5.22) 
implies the following error exponent: 



As in the case of a known channel, we see that the error exponent is a linear 
function of the rate, but an additional term of order 0(log log M / log M) is added. 
Here again, we interpret this term as a penalty for the lack of channel knowledge 
at the receiver. Furthermore, by taking M — > oo, we can also see that (5.53) 
coincides with [8, Proposition 1]. 

5.2.3 Training and Channel Estimation 

In many practical applications, communication over an unknown channel is done 
by means of channel estimation. In this approach, the transmission includes 
predefined training signals, which are known to the receiver and are used to esti- 
mate the channel parameters. As an alternative to the universal communication 
scheme introduced in this chapter, we can use the following method. Prior to 
any message transmission, the transmitter sends a training sequence, which the 

2 The excess of the average codeword length above the entropy of the source. 




(5.53) 
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receiver uses to estimate the channel. After the training phase, the transmitter 
sends the message. The receiver uses the estimated channel parameters to decode 
the message, using, for instance, the decoding rule at (4.2). A drawback from this 
approach is that even after the channel estimation phase, the residual error in the 
estimated channel parameters will degrade the performance of the decoder. Fur- 
thermore, enhancement of the channel estimation accuracy requires long training 
sequences, which will introduce non-negligible overhead to the transmission time. 
Clearly, using training will not lead to the convergence rate of (5.52). 
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Chapter 6 



Extensions 



6.1 Joint Source-Channel Coding 



In the previous chapters we assumed that the messages conveyed over the channel 
were equiprobable, which is the case if, for instance, the source of information 
has been compressed and the message W is the output of the source encoder. 
Assume now, that the messages have arbitrary probabilities vr(l), . . . , 7r(M). Each 
message now contains a different amount of information, which would translate 
into different codeword length at the output of the source encoder. However, in 
rateless codes the codeword assigned to each message is always infinite, and the 
actual codeword length is determined by the decoder. (The effective length of the 
message depends on the decoder's stopping time.) It is therefore tempting to use 
rateless codes for an uncompressed source and try to achieve good compression 
rate and reliable communication simultaneously. To simplify matters, we begin 
by tackling the case of known channel and postpone the analysis for unknown 
channel to Section 6.3. We use the following generalized version of the encoder 



where Qjyj IS EL threshold that depends on the message w, and we define a w = 
\ogA w . Repeating the derivation for the error probability done in the previous 



(4.2). 




(6.1) 
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section, we get by Markov's inequality 

Pr{^} (6.2) 

By choosing 

A w = ^— Vw e W (6.3) 

e ■ Tr(w) 

we get a uniform bound on the error probability 

Pr{E w } < e ■ A w> ) < e ( 6 - 4 ) 

which also implies 

P e < e (6.5) 

Thus, for an appropriate choice of message-dependent threshold values, the 
average probability of error for the entire message set is bounded by e. Recall, 
however, that the effective rate depends on the threshold value and therefore 
needs to be reexamined here. When different thresholds are used for different 
messages, the stopping time depends on which message crosses the threshold. 
We can therefore use Wald's equation (4.9) conditioned on the true message: 

E{T\W = w}<^-^ (6.6) 

where 

a w = logA w = -log 7T (w) - loge (6.7) 
Averaging on the entire message set, we have 

E{T} = E{E{T\W}} < E ^ a ^ + C 
E{-log7r(W0} - loge + C 



C 

H(W) -loge + C 
C 



(6.i 



where H(W) is the entropy rate of the source in bits per symbol. Let us now 
examine (6.8) in a practical setting. Suppose the we wish to convey blocks of 
K source bits with fixed probability of error e > 0. Since every source symbol 
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contains log M bits, Kj log M source symbols will be needed. Thus, the rate at 
which source bits can be conveyed over the channel will be 

K K-C 

~ E{f} ~ E^T_ log e + C (6 ' 9) 



logM 

c 



c- 



JP(W) + K 
C 1 



JP(W) 1 + 



C-log e 



(6.10) 
(6.11) 



Jtf{W)-K 

where we define J$?(W) = H(W)/\ogM as the per-bit entropy of the source. 

Note that the encoder used here, as well as the codebook, are the same ones 
defined in Chapter 2 and the only change is in the definition of the decoder. The 
encoder is uninformed on the statistics of the source or the capacity of the channel, 
yet the rate approaches the optimum rate achievable by an informed encoder. We 
note the practical implication of such scheme: the compression algorithms can be 
implemented and maintained at the decoder, while the encoder remains simple 
and source- independent. 



6.2 Source Coding with Side Information 

Suppose now, that the source of information emits independent pairs of messages 
(Wi, W 2 ) G Wi x W 2 according to a probability distribution ir w W2 (u>i, W2), which 
are encoded separately and pass through a noiseless channel. Suppose that R± 
and R 2 are the coding rates of W\ and W 2) respectively. By Slepian-Wolf theorem, 
if W\ is encoded with rate R\ > H(Wi), then W 2 can be encoded independently 
with R 2 = H(W 2 \Wi). (This rate pair is a corner point in the achievable rate re- 
gion.) We will now show that using rateless codes, we can approach this rate with 
some redundancy due to the usage of finite message set. The encoder of W\ as- 
signs to each message in Wi an infinite codeword c^j e {0, 1} 00 , w\ = 1, . . . , |VVi|, 
and transmits it over the channel. The encoder of W 2 operates similarly to that 
of W\ and independently of it, with codewords d W2 6 {0, 1} 00 , w 2 = 1, . . . , \W 2 \. 
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The codewords are assumed to be i.i.d. Bernoulli (1/2) sequences. To reconstruct 
Wi, the decoder can use the decision rule (6.1), to to obtain an error probability 
of 

Pr{#! + Wi} < ~ (6.13) 

Since binary code is used and the channel is noiseless, we have C = 1, so (6.8) 
implies that the expected transmission time for W\ satisfies 

R 1 = E{T 1 }< H(W) -log^ + 1 (6.14) 

Note that the coding rate is defined here as the average codeword length for the 
message set. Therefore, the effective rate equals the expected transmission time, 
rather than its reciprocal as in channel coding. 



Having decoded message Wi, the decoder uses the following decision rule to 
reconstruct W 2 : 

9n(y = < (6.15) 
l(J, it no such w 2 exists 

where 

z W2 , k = \og P{Vk ) dw f\ k = l,...,n (6.16) 
PKVk) 

Similar derivation for the error probability as in Section 6.1 yields 

Pr{W 2 ^w 2 \W 1 = w u W 2 =w 2 }< V — r (6.17) 

A(w 1 ,w 2 ) 

We choose 
so that 

p T {W 2 ^w 2 \W l = Wl , W 2 = w 2 } < e/2 • J2 V 2 | Wl M™0 < \ (6.19) 



Therefore, 



Pr{H> 2 ^ W 2 } < \ (6.20) 
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Using (6.13), (6.20) and the union bound, we have 

Pr{Wi ^ Wi (J W 2 ± W 2 } < e (6.21) 

Since a(wi,w 2 ) = — loge/2 — log^ . w (w2\wi), we can use Wald's equation 
for the stopping time of decoding W 2 to obtain 

R 2 = E{T 2 } = E{E{T 2 \W U W 2 }} 
<E{a{W u W 2 )} + l 
= E{-log V|Wi (^ 2 |iy 1 )}-log| + l 

= H(W 2 \W 1 ) -log^ + 1 (6.22) 

Combining (6.14) and (6.22), we get 

R 1 + R 2 = H(W 1 ,W 2 )-21og~ + 2 (6.23) 

Similarly to Section 6.1, if we take blocks of K source bits and a fixed error 
probability e > 0, we obtain 

R 1 + R 2 = H(W U W 2 ) ■ (l + O (I) J (6.24) 



6.3 Complete Universality 



We now consider the case of joint source-channel coding of an unknown source 
over an unknown channel, with an unknown amount of side-information at the 
receiver. Initially, we bring together the results of the previous sections to ob- 
tain a communication scheme for a source with unknown statistics over an un- 
known channel. As a straightforward generalization of the universal source coding 
scheme in Section 5.1, we use a fusion of the decoders (5.6) and (6.1), i.e. 

/nN / w ' Pu(c w \y n ) > A w -q(c w ) 

9n{y ) = < (6.25) 
10, it no such w exists 

where 

A w = ( 6 - 26 ) 

e ■ 7r{w) 
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Similar derivation to those done at Sections 5.1 and 6.1 yields the following rate 
for an uncompressed source W G {1, ...,M} over an unknown channel with 
capacity C: 



CI 



\x\\y\/2 

, log M In 2 , 

R = " mm / TT 6 -27 

where J#f(W) is defined in Section 6.1. We note that while the encoder can be 
ignorant of the source statistic, the decoder needs to know ir(w), w G W. 



We now go one step further and assume that the decoder has no knowledge of 
the statistics of the source or the channel. Suppose that the source S generates 
sequences of L symbols from an alphabet S, drawn i.i.d. according to set of 
|«S| unknown probabilities 7. Each sequence is encoded as one message, hence 
M = \<S\ L - Instead of using the set of thresholds (6.26), which depends on the 
unknown probabilities, we use a universal probability measure [10] 

fc(s L ) = J ufrKO^) (6.28) 

so that 

a w = log A™ = -loge - log7r(s L ) (6.29) 

If the weight function u(-) is chosen to be Jeffreys prior, we get (see [10, Eq.17]) 

E{a w } = - log e + H{W) + ^-T_i log _L + (1) (6.30) 

1 ZTTe 

Hence, similarly to (6.27) we can achieve the following rate 
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jf( W ) , C+/3-log e+™ (log log M-log C- ^ ) 

\ / log M 

where 

J&(W) = JV{W) + 151 ~ 1 , lQg ^ + O ( — !— ) (6.32) 



2 logM \\ogM 

Recall that L = logi S i M, so 

*{W) = J?(W) + |tS| - ll0 f lQ g M + O ( -i- ] (6.33) 
v ; v ; 2 logM V lo g M , 
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By plugging (6.33) into (6.31) we get 

where K = logM is the number of encoded bits. Comparing (6.34) to (6.12), we 
see that the leading term is unchanged and equals the optimal rate achievable 
by separated source-channel coding. However, the lack of information affects the 
rate of convergence, which is now dominated by a O ( '"f^ ) term, as opposed to 
O (j-) for an informed decoder. 

The implications of the latter result are far-reaching. We have shown that 
even if the statistics of both the channel and the source are unknown to the 
decoder, rateless coding not only achieves the best source-channel coding rate 
as M — > oo, but it also has the same asymptotics of a rateless scheme with an 
informed decoder. This observation has been made in [1, Ch.4] for infinitely 
large message sets. The results obtained here coincide with those of [1], and also 
quantify the redundancy caused by the lack of information on the source and the 
channel, and by the use of finite blocks. 

Unknown Side Information at the Decoder 

Similarly to Section 6.2, if the source contains side information V that is known 
non-causally at the decoder, we can further improve the communication rate. 
Combining the technique from Section 6.2 with the derivation above, we obtain 
the following rate for universal joint source-channel coding with side information 
at the decoder: 

where J$?(W\V) is the conditional entropy of the source W given the side infor- 
mation V, normalized by logM. Since J4?(W\V) < Jf?(W), the side information 
improves the rate, even if the encoder is uninformed on the amount (or the exis- 
tence) of the side information. 
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Chapter 7 
Summary 



In this study we developed and analyzed several communication schemes that are 
all based on the concept of rateless codes. In rateless codes, each codeword has an 
infinite length and the decoding length is dynamically determined by the confi- 
dence level of the decoder. Throughout this study, we allowed the coding schemes 
to have a fixed error probability, while aiming to achieve shortest mean transmis- 
sion time, or equivalently, the highest rate. This approach is different than the 
prevalent one, in which the communication rate is held fixed and the codebook 
is enlarged indefinitely so that the error probability vanishes. We demonstrated 
how rateless codes, combined with sequential decoding, can be used in basic com- 
munication scenarios such as communication over a DMC, but can also be used 
to solve more complex problems, such as communication over an unknown chan- 
nel. The decoding methods introduced here enabled us to obtain results for finite 
message set, while previous studies were restricted to asymptotic results. 

We began by describing rateless codes and surveyed some previous results 
related to such coding schemes. Then, we introduced the sequential decoder that 
uses a known channel law. Using Wald's theory and the notion of stopping time, 
we obtained an upper bound for the mean transmission time for a fixed error 
probability, and the resulting effective rate is shown to approach to the capacity 
of the channel as the size of the message set, M, grows. We also obtained an upper 
bound for the rate for a fixed error probability. The upper bound is not tight for 
small M, but it converges to the achievable rate as M — > oo. We conjecture that 
a stronger converse can be found, which will be tighter also in the non-asymptotic 
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realm. Although we developed the above-mentioned scheme for a DMC, we also 
demonstrated that it is applicable in a memoryless Gaussian channel. 

For the case of an unknown channel we introduced a novel decoding metric. 
Unlike previous studies, the universal decoding metric in not based on empirical 
mutual information, but on a mixture probability assignment. For an appropriate 
choice of mixture, we were able to bound the difference between the universal 
metric and the one used by an informed decoder. Thus, we used the results 
obtained for an informed decoder to upper bound the mean transmission time in 
the universal case. 

We then applied rateless coding to more advanced scenarios. We showed how 
with only a minor change in the sequential decoder, we can easily use rateless 
codes as a joint source-channel coding scheme. We also used rateless coding 
for source coding with side information, obtaining the optimum Slepian-Wolf 
rate for this setting. Finally, we combined the techniques for universal channel 
coding, joint source-channel coding and source coding with side information and 
demonstrated that even without any information on the source, the channel or 
the amount (or even the existence) of side information — reliable communication 
is feasible, and the rate can be analyzed even for a finite message set. 
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